Sparse Additive Text Models with Low Rank Background

نویسنده

  • Lei Shi
چکیده

The sparse additive model for text modeling involves the sum-of-exp computing, whose cost is consuming for large scales. Moreover, the assumption of equal background across all classes/topics may be too strong. This paper extends to propose sparse additive model with low rank background (SAM-LRB) and obtains simple yet efficient estimation. Particularly, employing a double majorization bound, we approximate log-likelihood into a quadratic lower-bound without the log-sumexp terms. The constraints of low rank and sparsity are then simply embodied by nuclear norm and l1-norm regularizers. Interestingly, we find that the optimization task of SAM-LRB can be transformed into the same form as in Robust PCA. Consequently, parameters of supervised SAM-LRB can be efficiently learned using an existing algorithm for Robust PCA based on accelerated proximal gradient. Besides the supervised case, we extend SAM-LRB to favor unsupervised and multifaceted scenarios. Experiments on three real data demonstrate the effectiveness and efficiency of SAM-LRB, compared with a few state-of-the-art models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Additive Text Model with Low Rank Background

The sparse additive model for text modeling involves the sum-of-exp computing, whose cost is consuming for large scales. Moreover, the assumption of equal background across all classes/topics may be too strong. This paper extends to propose sparse additive model with low rank background (SAM-LRB) and obtains simple yet efficient estimation. Particularly, employing a double majorization bound, w...

متن کامل

Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset

Background/foreground separation is the first step in video surveillance system to detect moving objects. Recent research on problem formulations based on decomposition into low-rank plus sparse matrices shows a suitable framework to separate moving objects from the background. The most representative problem formulation is the Robust Principal Component Analysis (RPCA) solved via Principal Com...

متن کامل

Subspace Representations of Unstructured Text

Since 1970 vector-space models have been used for information retrieval from unstructured text. The initial simple vector-space models suffered the same problems encountered today in searching the internet. These difficulties were significantly relieved by Latent Semantic Indexing (LSI), introduced in 1990 and improved through 1995. Starting with the simple vector-space model’s sparse term-by-d...

متن کامل

Robust PCA with compressed data

The robust principal component analysis (RPCA) problem seeks to separate lowrank trends from sparse outliers within a data matrix, that is, to approximate a n⇥d matrix D as the sum of a low-rank matrix L and a sparse matrix S. We examine the robust principal component analysis (RPCA) problem under data compression, where the data Y is approximately given by (L+S)·C, that is, a low-rank + sparse...

متن کامل

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

Probabilistic topic modeling of text collections is a powerful tool for statistical text analysis. Determining the optimal number of topics remains a challenging problem in topic modeling. We propose a simple entropy regularization for topic selection in terms of Additive Regularization of Topic Models (ARTM), a multicriteria approach for combining regularizers. The entropy regularization gradu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013